Concurrent deletion in a distributed content-addressable storage system with global deduplication

نویسندگان

  • Przemyslaw Strzelczak
  • Elzbieta Adamczyk
  • Urszula Herman-Izycka
  • Jakub Sakowicz
  • Lukasz Slusarczyk
  • Jaroslaw Wrona
  • Cezary Dubnicki
چکیده

Scalable, highly reliable distributed systems supporting data deduplication have recently become popular for storing backup and archival data. One of the important requirements for backup storage is the ability to delete data selectively. Unlike in traditional storage systems, data deletion in distributed systems with deduplication is a major challenge because deduplication leads to multiple owners of data chunks. Moreover, system configuration changes often due to node additions, deletions and failures. Expected high performance, high availability and low impact of deletion on regular user operations additionally complicate identification and reclamation of unnecessary blocks. This paper describes a deletion algorithm for a scalable, content-addressable storage with global deduplication. The deletion is concurrent: user reads and writes can proceed in parallel with deletion with only minor restrictions established to make reclamation feasible. Moreover, our approach allows for deduplication of user writes during deletion. We extend traditional distributed reference counting to deliver a failure-tolerant deletion that accommodates not only deduplication, but also the dynamic nature of a scalable system and its physical resource constraints. The proposed algorithm has been verified with an implementation in a commercial deduplicating storage system. The impact of deletion on user operations is configurable. Using a default setting that grants deletion maximum 30% of system resources running the deletion reduces end performance by not more that 30%. This impact can be reduced to less than 5% when deletion is given only minimal resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Moving from Logical Sharing of Guest OS to Physical Sharing of Deduplication on Virtual Machine

Current OSes include many logical sharing techniques (shared library, symbolic link, etc.) on memory and storage. Unfortunately they cause security and management problems which come from the dynamic management of logical sharing; e.g., search path replacement attack, GOT (Global Offset Table) overwrite attack, Dependency Hell, etc. This paper proposes that self-contained binaries eliminate the...

متن کامل

Efficient and Safe Data Backup with Arrow

We describe Arrow, an efficient, safe data backup system for computer networks. Arrow employs techniques of delta compression (or deduplication) to achieve efficient storage and bandwidth utilization, and collision-resistant hashing and error-correction coding to protect against and correct storage errors. keywords: content-addressable storage; error-correcting storage systems; data backup; ded...

متن کامل

Analysis of Disk Access Patterns on File Systems for Content Addressable Storage

CAS (Content Addressable Storage) is virtual disk with deduplication, which merges same-content chunks and reduces the consumption of physical storage. The performance of CAS depends on the allocation strategy of the individual file system and its access patterns (size, frequency, and locality of reference) since the effect of merging depends on the size of a chunk (access unit) used in dedupli...

متن کامل

Storage Deduplication and Management for Application Testing over a Virtual Network Testbed

With the virtual machine technologies, Virtual Ad hoc Network (VAN) testbed was designed to evaluate functional correctness and communication performance of Mobile Ad hoc Network (MANET) applications. When VAN is used for large-scale testing that requires hundreds of virtual machines, storage redundancy becomes an issue. Although Content Addressable Storage (CAS) techniques were designed to add...

متن کامل

Experiences with Content Addressable Storage and Virtual Disks

Efficiently managing storage is important for virtualized computing environments. Its importance is magnified by developments such as cloud computing which consolidate many thousands of virtual machines (and their associated storage). The nature of this storage is such that there is a large amount of duplication between otherwise discreet virtual machines. Building upon previous work in content...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013